Kenneth Tay
Oct 1, 2019
http://web.stanford.edu/~kjytay/courses/stats32-aut2019/
classes <- list(quarter = "Fall 2018/19",
ID = c("STATS 32", "STATS 101", "STATS 200"),
credits = 12)
classes$ID## [1] "STATS 32" "STATS 101" "STATS 200"
## [1] 12
A special type of list:
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
str, summaryhead, tailnames, dim, nrow, ncoltablemean, median, sd, varfactorggplot2 (and the + syntax)“The simple graph has brought more information to the data analyst’s mind than any other device.” - John Tukey
## mpg weight cylinders
## 1 21.0 2.620 6
## 2 21.0 2.875 6
## 3 22.8 2.320 4
## 4 21.4 3.215 6
## 5 18.7 3.440 8
## 6 18.1 3.460 6
## 7 14.3 3.570 8
## 8 24.4 3.190 4
## 9 22.8 3.150 4
## 10 19.2 3.440 6
## 11 17.8 3.440 6
## 12 16.4 4.070 8
## 13 17.3 3.730 8
## 14 15.2 3.780 8
## 15 10.4 5.250 8
## 16 10.4 5.424 8
## 17 14.7 5.345 8
## 18 32.4 2.200 4
## 19 30.4 1.615 4
## 20 33.9 1.835 4
## 21 21.5 2.465 4
## 22 15.5 3.520 8
## 23 15.2 3.435 8
## 24 13.3 3.840 8
## 25 19.2 3.845 8
## 26 27.3 1.935 4
## 27 26.0 2.140 4
## 28 30.4 1.513 4
## 29 15.8 3.170 8
## 30 19.7 2.770 6
## 31 15.0 3.570 8
## 32 21.4 2.780 4
“The simple graph has brought more information to the data analyst’s mind than any other device.” - John Tukey
## mpg weight cylinders
## 1 21.0 2.620 6
## 2 21.0 2.875 6
## 3 22.8 2.320 4
## 4 21.4 3.215 6
## 5 18.7 3.440 8
## 6 18.1 3.460 6
## 7 14.3 3.570 8
## 8 24.4 3.190 4
## 9 22.8 3.150 4
## 10 19.2 3.440 6
## 11 17.8 3.440 6
## 12 16.4 4.070 8
## 13 17.3 3.730 8
## 14 15.2 3.780 8
## 15 10.4 5.250 8
## 16 10.4 5.424 8
## 17 14.7 5.345 8
## 18 32.4 2.200 4
## 19 30.4 1.615 4
## 20 33.9 1.835 4
## 21 21.5 2.465 4
## 22 15.5 3.520 8
## 23 15.2 3.435 8
## 24 13.3 3.840 8
## 25 19.2 3.845 8
## 26 27.3 1.935 4
## 27 26.0 2.140 4
## 28 30.4 1.513 4
## 29 15.8 3.170 8
## 30 19.7 2.770 6
## 31 15.0 3.570 8
## 32 21.4 2.780 4
What is the distribution of cylinders in my dataset?
What is the distribution of miles per gallon in my dataset?
What is the relationship between mpg and weight?
What is the relationship between mpg and time?
Not so good…
Easier to see the trend
For each value of cylinder, what is the distribution of mpg like?
How often does each pair of cylinder and gear occur in the dataset?
I have father-son pairs. For each pair, I record their height and weight, as well as their ethnicities. I want to study the relationship between characteristics of the father and that of the son. What plots could help me?
ggplot2ggplot2 packageggplot2 reference manualData: Dataset we are using for the plot
## mpg weight cylinders
## 1 21.0 2.620 6
## 2 21.0 2.875 6
## 3 22.8 2.320 4
## 4 21.4 3.215 6
## 5 18.7 3.440 8
## 6 18.1 3.460 6
## 7 14.3 3.570 8
## 8 24.4 3.190 4
## 9 22.8 3.150 4
## 10 19.2 3.440 6
Geometries: Visual elements used for our data
Geom: point
Aesthetics: Defines the data columns which affect various aspects of the geom
3 different aesthetics:
ggplot2 code
Optional material
One graphic contains:
Sometimes we need to tweak the position of the geometric elements because they obscure each other.
Only 9 data points??
Much better
Default colors
Manually chosen colors
rgb(0,0,1), rgb(1,0,0), rgb(0,0,0), rgb(1,1,1)